4,509 research outputs found

    Multi-tenant Pub/Sub processing for real-time data streams

    Get PDF
    Devices and sensors generate streams of data across a diversity of locations and protocols. That data usually reaches a central platform that is used to store and process the streams. Processing can be done in real time, with transformations and enrichment happening on-the-fly, but it can also happen after data is stored and organized in repositories. In the former case, stream processing technologies are required to operate on the data; in the latter batch analytics and queries are of common use. This paper introduces a runtime to dynamically construct data stream processing topologies based on user-supplied code. These dynamic topologies are built on-the-fly using a data subscription model defined by the applications that consume data. Each user-defined processing unit is called a Service Object. Every Service Object consumes input data streams and may produce output streams that others can consume. The subscription-based programing model enables multiple users to deploy their own data-processing services. The runtime does the dynamic forwarding of data and execution of Service Objects from different users. Data streams can originate in real-world devices or they can be the outputs of Service Objects. The runtime leverages Apache STORM for parallel data processing, that combined with dynamic user-code injection provides multi-tenant stream processing topologies. In this work we describe the runtime, its features and implementation details, as well as we include a performance evaluation of some of its core components.This work is partially supported by the European Research Council (ERC) un- der the EU Horizon 2020 programme (GA 639595), the Spanish Ministry of Economy, Industry and Competitivity (TIN2015-65316-P) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    New Beginnings for Nontraditional Students

    Get PDF

    Building Plans at the School of Law

    Get PDF

    A Devotion to the Law

    Get PDF

    Senator Sarbanes at Chevy Chase Club

    Get PDF

    ALOJA: A benchmarking and predictive platform for big data performance analysis

    Get PDF
    The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system's cost-performance1. This article describes the evolution of the project's focus and research lines from over a year of continuously benchmarking Hadoop under dif- ferent configuration and deployments options, presents results, and dis cusses the motivation both technical and market-based of such changes. During this time, ALOJA's target has evolved from a previous low-level profiling of Hadoop runtime, passing through extensive benchmarking and evaluation of a large body of results via aggregation, to currently leveraging Predictive Analytics (PA) techniques. Modeling benchmark executions allow us to estimate the results of new or untested configu- rations or hardware set-ups automatically, by learning techniques from past observations saving in benchmarking time and costs.This work is partially supported the BSC-Microsoft Research Centre, the Span- ish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Generalized optimization models of linguistic laws

    Get PDF
    Quantitative linguistics studies human language using statistical methods. It aims to build general theories from the statistical laws observed in a wide variety of languages. As part of the scientific method, these theories should be able to make novel predictions. This thesis is based on a family of models of human language. These models have shown to reproduce language laws, such as Zipf's law. They have also been used to make predictions, such as the biases present in child word learning. This family of models is based on the minimization of a cost function. The cost function is defined using a combination of information theoretic measures on a bipartite graph of associations between words (or, more generally, forms) and meanings (more generally, counterparts). It balances between the entropy of words and the mutual information of words and meanings. Entropy is a measure of surprisal, the cost of the speaker, and should be minimized. Mutual information is the amount of information obtained from a meaning while observing a word, the cost of the listener, and it should be maximized. The model is then optimized with a Markov Chain Monte Carlo method at zero temperature. This thesis is centered on two models belonging to this family, the "internal model" and the "external model". This thesis makes several contributions in relation to these models. The mathematical equations defining them are derived, including dynamic equations which reduce the computational complexity of the optimization process. In addition, several techniques are introduced which aim to reduce the significant problem of numerical error due to floating point arithmetic without compromising efficiency. Another contribution is the replication of results obtained by previous models based on this family which had been published originally with replicability issues. The models go through an optimization process. After this process the linguistic laws they can predict are examined, as well as to which degree they can be predicted. A key contribution is that these models are able predict the relationship between the age of a word and its frequency. This prediction is robust and appears in all cases with any combinations of parameters. The effects of several initial conditions in the optimization process is also studied. Finally, a tool has been developed and released as open source with the aim that others can easily replicate these results and investigate other properties of this family of models

    Topology-aware GPU scheduling for learning workloads in cloud environments

    Get PDF
    Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version
    corecore